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This  paper  is  the  first  of  a  series  aimed  at  developing  a  theory  of  early  visjjal 
processing  in  reading.  We  suggest  that  there  has  been  a  close  parallel 
in  the  development  of  theories  of  reading  and  theories  of  vision  in 
Artificial  Intelligence.  We  propose  to  exploit  and  extend  recent  results  in 
Computer  Vision  to  develop  an  improved  model  of  early  processing  in  reading. 

This  first  paper  considers  the  problem  of  isolating  words  in  text  based  on 
the  information  which  Marr  and  Hildreth's  (1980)  theory  asserts  is  available 
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in  the  parafovea.  We  show  in  particular  that  the  findings  of  Fisher  (1975)  on  reading 
transformed  texts  can  be  accounted  for  without  postulating  the  need  for  complex 
interactions  between  early  processing  and  downflowing  information  as  he  suggests. 

The  paper  concludes  with  a  brief  discussion  of  the  problem  of  integrating  information 
over  successive  saccades,  and  relates  the  earlier  analysis  to  the  empiracal  findings 
of  Rayner. 
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1.  Introduction 


This  paper  presents  computational  and  psychophysical  evidence  in  support  of  a  theory  of  one 
of  the  earliest  stages  of  visual  processing  in  reading,  namely  the  isolation  of  words  in  text.  As  such 
it  is  the  first  step  in  the  development  of  a  computational  theory  of  reading  w  hose  general  direction  is 
presented  in  the  next  section.  A  skeletal  outline  of  the  paper  follows. 

I  hc  goal  of  reading  may  he  supposed  to  be  the  efficient  extraction  of  meaning  from  imaged 
text.  Realising  this  goal  involves  integrating  "upward  flowing”  information  uncovered  hy  early  visual 
processing  with  “downward  flowing"  cognitive  interpretations.  In  this  paper,  we  present  an  approach 
toward  understanding  the  visual  aspects  of  reading  which  we  believe  may  contribute  greatly  Ur  an 
understanding  of  the  overall  reading  process. 

Existing  theories  of  reading  have  relied  on  a  primitive  model  of  early  visual  processing.  We 
suggest  that  as  a  result  they  have  typically  accorded  loo  much  emphasis  to  the  role  of  "downward 
flowing”  cognitive  information,  in  effect  suggesting  that  its  deployment  is  necessary  for  almost  every 
aspect  of  reading.  Indeed,  over  the  past  two  decades  there  has  been  a  close  parallel  between  the 
development  of  theories  of  reading  and  theories  of  visual  perception  in  Artificial  Intelligence  (Al). 
In  particular,  wc  note  that  a  number  of  reading  theorists  have  recently  been  attracted  to  complex 
processing  models  developed  in  Al.  A  major  attraction  of  such  models  is  that  they  seem  to  provide 
a  mechanism  supporting  flexible  behavior  hy  which  information  available  as  a  result  of  early  visual 
processing  could  combine  with  downflowing  information  about  the  specific  image  domain  to  produce 
an  interpretation  or  percept.  Still  more  recently.  Al  has  witnessed  a  fascination  with  relaxation  style 
processing.  This  is  not  only  claimed  to  support  the  interaction  between  low  level  and  downflowing 
information,  but  to  do  so  by  local  parallel  interaction.  A  number  of  reading  theorists  have  proposed 
similar  mechanisms.  For  the  most  part,  these  theories  have  had  limited  success  in  explaining  the 
empirical  psychophysical  data  on  reading.  Wc  argue  that  this  is.  in  part,  because  they  depend  upon  a 
primitive  model  of  early  visual  processing.  It  is  also  partly  because  of  an  emphasis  on  the  mechanism 
of  integrating  information  from  various  sources,  without  addressing  the  issues  of  what  purpose  the 
information  serves,  what  is  the  information  which  is  passed,  and  how  it  is  represented  (see  Marr,  1980. 
Marram!  Nishihara,  1978). 

Over  llic  past  few  years  there  has  been  considerable  progress  in  understanding  early  visual 
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processing.  Hie  achievements  of  Horn.  Man,  I'oggm  I  lliiiian.  and  others  id  developing  a  compuia 
lum.il  theory  ol'  natural  visual  perception  has  little  or  no  toniucr|iarl  in  theories  of  lending,  l  or 
example.  Insby  (1979,  page  10X)  anti  AHport  (19811.  page  21S)  equate  early  processing  with  feature 
extraction  as  developed  in  optical  character  recognition  systems  (Dndu  and  Hart.  1971).  A  fuller 
account  of  the  relevant  empirical  findings  is  given  in  Cohen  (1978.  page  65).  but  her  analysis  falls 
considerably  short  of  a  being  a  precise  and  coherent  theory.  The  computational  theory  of  natural 
vision  suggests  that  much  richer  information  can  be  made  available  by  early  visual  processing  in 
reading,  without  the  aid  of  downward  flowing  "higher  level"  knowledge  of  live  domain  being  viewed. 
Reading  has  always  attracted  a  great  deal  of  attention  from  perceptual  psychologists,  in  part  because 
of  the  light  it  might  shed  on  our  understanding  of  human  perception  of  the  natural  world.  We  claim 
that,  temporarily  at  least,  the  bool  is  on  the  other  foot,  and  that  die  recent  developments  in  our 
understanding  of  real  world  perception  can  he  gainfully  applied  to  increase  out  understanding  of 
reading. 

f  inally,  we  review  some  empirical  findings  about  the  earliest  surges  of  visual  processing  in  read¬ 
ing.  and  we  settle  upon  the  isolation  of  words  as  (he  first  goal  of  the  reader  s  perceptual  processing. 
We  note  that  eye  movement  studies  show  that  a  great  deal  of  processing  is  carried  out  on  text  prior 
to  foveation.  It  follows  that  it  is  reasonable  to  conjecture  that  word  isolation  is  effected  on  the  basis 
of  information  available  in  the  parafovea.  As  part  of  an  investigation  of  dvis  conjecture,  we  suggest 
that  l  ishcr'sf  1975)  results  on  transformed  text  provide  some  insight  into  parafoveal  word  isolation, 
and  so  we  analyze  his  results  carefully.  We  argue  that  Uicy  can  be  explained  on  the  basis  of  Marr 
and  llildicllv'sl  1980)  theory  of  edge  detection  without  postulating  the  need  for  "higher  order  visual 
processing"  as  was  claimed  by  fisher.  live  explanation  leads  to  a  number  of  empirical  predictions, 
which  arc  confirmed  using  fisher's  own  methods  and  materials,  live  concluding  section  sketches  a 
theory  of  word  isolation  in  the  parafovea.  and  notes  that  the  decision  to  activate  the  reading  process  in 
the  first  pkice  is  also  not  very  mysterious. 

2.  Background  to  the  study 
2.1  Past  approaches  to  theories  of  reading. 

from  the  earliest  days  of  experimental  psychology  there  has  been  a  constant  stream  of  research 
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findings  reading  fscc  tor  example.  Ilucy.  I90K  llcndeison  1977).  Ml  of  the  major  schools 
ol  perception  have  considered  leading  lo  some  extern .  and  have  attempted  to  exploit  \ariotis  malhc- 
niatieal  and  lompnintional  insights  lo  develop  (heir  theories  We  aie  particularly  concerned  with  the 
gtowth  ol  interest  over  the  past  two  decades  dining  which  time  a  number  ol  theories  have  developed, 
the  majonfv  being  expressed  in  leinis  of  infoninithm  pr<H  C*snif>. 

Kelalive  to  the  Itehav  101  ists  reliance  on  a  simple  mechanism  which  bore  many  of  the  charac- 
lenstiis  ot  early  pallet n  recognition  systems,  and  the  extreme  wordiness  of  the  Gestalt  and  New 
I  ook  theoiisis.  in  loi  illation  piocessmg  .kcoiiiiis  ol  reading  aie  relieslungly  precise.  I  hey  consist 
ol  individuated  viagcv.  at  which  some  particular  fmn  iioiui'h  delined  jnocess'  is  earned  out  (say  to 
extr.ict  features  01  to  consult  a  lexicon),  together  with  niieiconnectiiig  tinonx  which  represent  the 
How  of  information  through  the  syslei  i  under  consideiation  An  important  ptopeily  of  such  models 
is  that  they  describe  the  way  in  which  a  perceptual  01  cognitive  ptocess  being  studied  unfolds  over 
lime.  I  he  particular  class  of  individuated  stage  processes,  and  die  topology  of  interconnecting  arrows, 
are  care  fully  chosen  to  account  for  relevant  empirical  liitdttigs.  While  the  powei  of  such  foiuulistm  is 
dearly  sufficient  to  account  lor  any  given  set  of  descriptions,  in  the  absence  of  a  wholly  precise  mathe¬ 
matical  or  computational  account  of  reading,  any  particular  model  is  inevitably  vague  in  places.  Ihc 
ostein  to  which  it  doc's  or  does  not  adequately  explain  the  available  empirical  data  <  and  the  precision 
of  die  predictions  which  can  be  made  from  it)  arc  limited,  lor  example,  Gough!  197?)  presents  a 
How  diagram  of  "one  second  of  reading”  which  embodies  the  theory  that  phonological  recoding  is 
obligatory .  Marcel  and  I’allcrsonf  1979)  present  an  alternative  in  which  it  is  not.  lor  further  examples, 
see  I  sicsf  1977),  Cohenf  1978).  and  McC  lelland  and  1<  umelb.it  t(  1980). 

I  hc  box  and  arrow  diagrams  which  feature  in  most  information  processing  accounts  of  percep¬ 
tion  arc  highly  reminiscent  of  the  system  rtowchaits  which  used  to  he  prepared  by  programmers  in 
the  early  stages  of  developing  a  program.  I  lows  harts  have  fallen  into  disrepute  in  computer  science 
as  it  has  been  realized  that  they  provide  an  impoverished  representation  of  such  a  key  issue  as  the 
structure  of  a  program.  They  are  also  wholly  inadequate  as  a  representation  of  process  interaction  and 
parallelism,  being  essentially  restricted  lo  the  description  of  a  single  sequential  process.  Of  course, 
they  arc  merely  the  simplest  first  approximation  to  a  model  of  processing,  though  one  should  be 
aware  of  the  Computer  Science  experience  that  they  mi.icccptably  slratljaclcci  thinking. 
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PAST  APPROACHES  TO  THEORIES  OF  READING 


Several  authors  have  argued  that  it  is  not  possible  to  develop  a  theory  of  an  ability  such  as 
reading,  in  which  the  flow  of  information  is  wholly  unidirectional,  that  is.  a  flow  that  proceeds  from 
the  processes  which  embody  relatively  general  knowledge,  and  which  make  contact  with  the  intensity 
levels  of  the  image  to  the  processes  embodying  knowledge  about  the  specific  objects  and  situations 
depicted  in  the  image  (sec  for  example  Allporl(  1979).  Irisby(1979),  Cohen!  1978),  Rumelhart(1977)). 
It  is  supposed  that  "downward  flow”  of  knowledge  about  such  objects  and  situations  is  also  necessary 
to  account  for  the  remarkable  abilities  and  flexibility  of  human  perception. 

The  invocation  of  "downward  flow"  as  an  explanation  for  reading  abilities  has  an  interesting 
(perhaps  not  coincidental)  parallel  with  the  history  of  computational  theories  of  natural  visual  per¬ 
ception  in  the  field  of  Artificial  Intelligence  (Al).  The  period  1963  to  tire  early  1970's  in  the  develop¬ 
ment  of  Al  was  most  notable  for  extensive  experimentation  with  edge  detecting  or  region  finding 
operators,  designed  <nt  hoc  in  accordance  with  the  needs  of  some  particular  project.  Authors  time  and 
again  noted  that  the  results  of  applying  their  operators  to  digitized  images  were  essentially  unpredict¬ 
able:  many  concluded  that  it  was  simply  not  possible  to  develop  a  theory  of  early  visual  processing 
capable  of  generating  predictably  rich  and  useful  descriptions  that  could  then  be  used  as  the  basis  for 
computing  the  v  isible  surfaces  and  objects  in  a  scene.  It  was  supposed  therefore  that,  just  as  in  the 
ease  of  reading  (although  the  Al  workers  involved  would  not  have  known  of  the  parallel),  "downward 
flow"  of  knowledge  about  the  objects  and  situations  imaged  in  the  scene  was  essential  to  explain  die 
remarkable  abilities  of  human  visual  perception.  The  interaction  betw  een  upward  flowing  information 
generated  by  relatively  unknowledgcablc  early  processing  modules  and  downward  flowing  informa¬ 
tion  was  essentially  dynamically  determined  and  could  not  be  completely  defined  in  advance.  It  was 
conjectured  by  Minsky  and  Papertf  1972)  that  among  the  tools  developed  in  computer  science,  the 
best  way  to  achieve  this  dynamically  determined  behavior  was  through  process  interactions,  which, 
it  was  noted,  need  not  be  restricted  to  the  simple  patterns  of  (serial)  activity  provided  in  a  language 
like  Fortran  or  Algol.  T  hese  were  the  considerations  which  lay  behind  die  development  of  a  rash 
of  complex  "heterarchical"  programs  to  understand  natural  language,  perceive  utterances  from  a 
speech  signal,  and  see  in  various  narrowly  defined  domains.  Programs  such  as  Hearsay  2  (Lesser 
and  I  rman.  1977),  Margic(Schank  ct  al.,  1973),  Barrow  and  Tenenbaum's  (1976)  Interpretation 
Guided  Semantics,  and  the  author’s  own  program  for  "reading”  Fortran  code  (Brady.1979;  Brady  and 
Wiclinga.  1978)  arc  typical  of  the  genre. 


PAS  I  APPROAt'llLS  TO  IIIIORlfcS  Ol  READING 


The  development  of  complex  "heterarchical"  programs  such  as  Margie  and  Hearsay  2  is  paral¬ 
leled  by  the  adoption  of  those  computational  models  of  processing  by  reading  theorists  eager  to 
explain  the  use  of  downward  and  upward  flow  as  determinants  of  a  percept.  Kxamplcs  arc  Cohen's 
<1978)  discussion  of  Speechlis  (Nash-Webber,  1975),  and  Allport's  ( 1979)  detailed  explanation  of  the 
operation  of  Margie. 

In  fact,  a  number  of  difficulties  emerged  in  the  dynamic  processing  account  of  perception  as 
soon  as  vague  theoretical  notions  like  "process  interaction"  needed  to  be  made  precise  (sec  llrady, 
1979).  There  arc  two  basic  difficulties,  one  technical,  the  other  more  empirical  in  nature  though 
reflecting  a  theoretical  shortcoming.  Technically,  the  potency  of  process  interactions,  and  the  stock 
of  ideas  about  how  to  control  and  analyze  them,  remain  very  limited  indeed.  Secondly,  and  most 
notably,  the  presumed  power  of  heterarchy  never  materialized.  It  repeatedly  became  evident  that  a 
small  increase  in  the  early  processing  capabilities  of  programs  could  have  a  far  greater  impact  on  the 
performance  of  a  program  as  a  whole  than  a  vastly  greater  amount  of  "higher  level  reasoning". 

Consider  in  particular  the  case  of  Hearsay  2  (Lesser  and  Krman,  1977).  One  of  die  main  innova¬ 
tions  of  Hearsay  2  was  die  introduction  of  a  centralized  data  structure  called  the  "blackboard",  on 
which  the  findings  of  a  number  of  "knowledge  sources"  (which  performed  such  tasks  as  isolating 
phonemes,  syllables,  words,  or  larger  syntactic  units)  were  presented.  At  any  stage  of  the  processing 
of  a  speech  signal  corresponding  to  an  utterance,  the  contents  of  the  blackboard  represented  die  state 
of  die  system's  interpretation.  Ihc  addition  of  a  piece  of  information  by  one  knowledge  source  could 
enable  the  activity  of  several  others.  At  any  given  stage,  there  were  typically  many  runnable  processes 
(up  to  two  hundred),  each  of  which  was  assigned  a  numerical  priority  value  indicating  its  apparent 
importance.  Ibis  design  is  illustrated  in  figure  la.  which  shows  the  Hearsay  2  system  as  of  January 
1976.  'Ihe  authors  note  that  "this  implementation  had  poor  performance  (eg  10%  of  sentences  correct 
in  85  million  instructions  per  second  of  speech  on  a  250  word  vocabulary”  l  esser  and  l;.nnan  1977, 
page  790).  A  second  design,  shown  in  figure  lb.  was  aimed  at  "making  the  lower  levels  of  processing 
more  sequential  and  bottom  up"  Lesser  and  Lrman  1977,  page  795.  The  authors  reported  dial  "this 
configuration  performs  substantially  better  (eg  85%  correct  in  60  million  instructions  per  second  of 
speech  on  a  1000  word  vocabulary)"  Lesser  and  Krman  1977,  page  790. 

Some  Al  researchers  (see  for  example  Davis  and  Roscnfcld  1978, 1981.  Harrow  and  Tenenbaum 
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figure  1.  The  struct lire  of  the  blackboard  slate  descriptor  lor  the  Hearsay  2  speech  understanding 
system.  Figure  la:  the  sjsteni  as  of  January  1976.  Figure  lb:  the  second  version  as  of  September 
1976  (Reproduced  from  Lesser  and  Erman  1977) 


1978  Uoscnfcld,  Hummel,  and  Zuckcr,  1976,  Waltz,  1978,  /.ticker  1978)  concluded  that  the  main 
drawback  of  the  heterarchical  process  organisations  discussed  above  was  that  they  were  essentially 
serial.  Huy  argue  that  much  of  their  complexity  arises  because  one  is  forced  to  choose  a  particular 
sapiential  order  in  which  to  carry  out  a  number  of  processes.  Since  this  order  is  inevitably  often  inap- 
piupriatc  (being  unpredictable),  one  is  (hen  required  to  incorporate  sufficient  mechanism  to  facilitate 
teem  cry.  Instead,  such  authors  suggest  the  use  of  globally  constrained  local  parallel  processes,  usually 
based  on  relaxation  or  other  forms  of  nonlinear  programming  (sec  I  uenberger,  1973).  Note  that  in 
common  w  ith  die  heterarchy  approach,  die  structure  of  the  mechanism  is  developed  and  fixed  in 
advance  of  the  analysis  of  (lie  particular  perceptual  problem  being  studied.  The  only  issues  which  the 
theorist  is  left  to  settle  in  most  accounts  arc  parameter  settings,  such  as  the  size  of  neighborhoods, 
thresholds,  and  the  like  (see  Davis  and  Koscnfcld,  1981).  We  argued  above  that  a  major  drawback 
with  heterarchical  accounts  of  perception  was  llic  difficulty  in  analysing  and  controlling  them.  It  is 
important  to  realise  that  analogous  problems  arise  with  relaxation  processes.  It  is  usually  extremely 
hard  to  guarantee  dint  such  a  process  settles  down  to  a  steady  state  ("converges").  As  .in  example, 
consider  the  difficulty  that  Marr,  Palm,  and  Poggio(1978)  had  in  analysing  ihe'bchav ior  of  dre  Marr 
and  PoggiiK  1976a,  1976b)  cooperative  algorithm  for  computing  stereo  disparity.  If  this  is  difficult  for 
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a  single  level  of  relaxation  processing,  it  is  considerably  more  so  for  lire  hierarchical  or  multi  stage 
processes  which  have  been  advanced,  though  usually  not  implemented  and  tested,  in  the  literature  (eg 
McClelland  and  Kumelhari,  1980.  Davis  and  Rosenfeld.  1978.  /.ucker.  1978).  l  ew  (if  any)  results  arc 
known  regarding  the  convergence  (including  speed  of  convergence)  of  such  relaxation  processes  (see 
Ullman,  1979,  /ucker.  I  eclerc,  and  Mohammed.  1979) .  Without  such  results,  the  uncritical  proposal 
of complex  locally  parallel  processes  is  of  questionable  significance. 

2.2  The  computational  approach  to  vision 

Against  this  background  of  ad  hoc  experimentation  and  die  construction  of  uncontrollable  complex 
processing  models  in  Artificial  Intelligence,  the  computational  theory  of  natural  visual  perception 
developed  by  I  lorn,  Mari,  Ullman,  I’oggio,  Binford.  and  others  is  quite  remarkable.  A  fuller  account 
of  the  current  slate  of  computer  vision  can  be  found  elsewhere  (Marr.  198(1,  Brady,  1981.  Morn.  1978. 
Mart  and  I’oggio.  1979.  Man  and  Hildreth.  198(1,  (irimson.  1980).  I  'or  the  purposes  of  this  article, 
it  is  sufficient  to  note  that  there  now  arc  mathematically  precise  theories  and  highly  parallel,  robust 
computer  implementations  of  a  variety  of  (human)  visual  processes.  These  include  edge  detection, 
stcreopsis,  shape  from  shading,  shape  from  texture,  early  motion  detection,  and  surface  interpolation. 
In  each  case  these  theories  concern  processes  which  occur  at  an  early  stage  of  perception,  and  dicy 
embody  knowledge  about  die  world  which  is  of  considerable  generality,  for  example  that  die  world 
mostly  consists  of  smooth  surfaces.  In  short,  the  computational  theory  of  vision  is  a  compelling 
argument  in  support  of  die  power  of  early  visual  processing.  More  significantly  perhaps,  it  promotes 
a  research  methodology  which  defers  consideration  of  knowledge  rich,  domain  specific,  downward 
llow  of  information  until  the  considerable  scope  of  early  processing  is  more  clearly  understood.  It  also 
makes  little  sense  to  develop  an  understanding  of  the  role  of  downward  flow  until  we  have  a  better 
appreciation  of  vvliat  information  early  processing  can  and  docs  provide. 

I  he  computational  theory  of  visual  perception  referred  to  above  is  also  interesting  for  die  re¬ 
search  mcdiodology  which  has  developed  from  it.  The  first  step  is  to  isolate  a  perceptual  ability  for 
which  dicre  is  empirical  evidence  for  considerable  competence  on  die  basis  of  early  processing,  l-or 
example.  llorn(!974)  has  studied  the  determination  of  lightness  and  the  computation  of  and  shape 
from  shading  (1978)  from  an  image.  Marr  and  his  colleagues  have  considered  edge  delation  (Marr 
and  Hildreth.  1979),  stcreopsis  (Marr  and  I’oggio  (1979).  Crimson  (1980)),  and  motion  computation 
(Ullman  (1978),  Marr  and  Ullman  (1979).  Ullman  and  Richter  (1980)).  The  particular  problem  is 
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ihcn  studied  in  ihrcc  statics  wc  consider  what  information  must  be  extracted  from  the  scene,  in 
order  foi  die  system  to  exhibit  this  competence,  and  nlnti  constraints  on  the  world  the  system  needs  to 
assume  in  order  to  extract  this  information.  The  next  step  is  to  devise  a  representation  which  makes 
explicit  the  information  required  to  explain  the  competence.  Only  then  is  it  reasonable  to  devise 
algorithms  to  discover  the  appropriate  representation  instance  for  a  scene,  finally,  one  can  conduct 
experiments  to  discover  the  extent  to  which  the  algorithm  explains  human  performance.  Notice  that 
in  contrast  to  this  mclhodolgy.  the  heterarchical  and  relaxation  processes  outlined  above  start  with 
an  algorithm  (  or  commitment  to  a  particular  restiicled  kind  of  processing)  and  only  then  examine 
competence,  do  isc  representations,  and  analyze  the  basis  of  the  competence. 

2.3  Edge  detection  in  the  human  visual  system 

As  an  example  of  the  results  of  the  computational  approach  to  early  visual  processing,  we  take 
a  brief  look  at  Mart-  and  Hildreth’s  (1980)  theory  of  edge  detection.  The  reason  for  this  choice  is 
quite  simple,  flic  theory  addresses  the  very  first  stage  of  analysis  of  the  visual  input,  and  this  is  die 
stage  which  is  most  relevant  to  the  study  of  parafoveal  processing  in  reading  which  is  presented  in  die 
balance  of  the  paper. 

Man  and  I  lildreth  ( l‘)80.  page  189)  point  out  that  "a  major  dilliculty  with  natural  images  is  that 
changes  can  and  do  occur  over  a  wide  range  of  scales,  so  it  follows  that  one  should  seek  a  way  of 
dealing  with  the  changes ocuiring  at  different  scales."  One  way  to  do  this,  which  has  been  proposed 
several  times  in  die  image  processing  literature,  is  to  pass  the  image  through  a  number  of 'hand  limited 
filters.  Of  course,  the  dillicull  issues  concern  the  choice  of  filters  (bar  mask,  l-ouricr.  Gaussian),  tire 
number  of  them,  and  the  exact  band  pass  characteristics  of  each. 

In  fact,  intensity  changes  arc  mostly  localised  in  space,  a  fact  which  can  be  explained  by  their 
physical  causes  (see  Horn  (1977).  Man  (1976),  Marr  and  llildrclh(1980,  page  189)).  They  arc  also 
Idealised  m  the  frequency  domain,  since  the  world  is  mostly  composed  of  visible  surfaces  of  roughly 
uniform  texture  Marr  and  Hildreth  (1980,  page  191)  note  that  "unfortunately,  these  two  localization 
requirements,  the  one  in  the  spatial  and  the  other  in  the  frequency  domain,  arc  conflicting".  They 
point  out  that  the  Gaussian  optimises  localisation  in  both  domains  simultaneously,  and  so  it  is  chosen 
as  the  band  limiting  filter  in  the  theory. 

In  order  to  locale  edges,  one  can  either  find  places  where  the  first  derivative  of  the  intensity 
function  reaches  a  maximum,  or  equivalently  where  the  second  derivative  is  zero.  To  locate  edges  at 


Till:  ISOLATION  Ol  WORDS  IN  TEXT 


arbitrary  orient, nii ms  with  equal  facility,  we  require  a  differential  operator  which  is  not  directional. 
The  I  apiarian  is  the  only  first  or  second  order  differential  operator  with  this  property.  Thus  die 
Marr  and  Hildreth  theory  asserts  that  following  Gaussian  smoothing,  the  image  is  convolved  with  a 
I  aplacian  and  zero  crossings  noted.  In  fact,  by  the  so-called  convolution  theorem, 

V2(G*Image)  =  ( VlG)*Image , 

where  G  is  a  Gaussian  operator,  and  *  denotes  convolution.  Marr  and  I  lildretli(  1980,  page  193)  point 
out  that  the  V2G  operator  closely  resembles  the  difference  of  Gaussian  (DOG)  operators  proposed 
by  Wilson  and  Giese  (1977)  (see  also  Wilson  and  Bergen,  197')).  Indeed  they  show  that  V*G  is  die 
limit  of  a  DOG,  and  that  (lie  DOG  closely  approximates  it.  Wilson  and  Mergetfs  work  suggests  that 
there  should  lie  four  bandpass  channels  at  each  retinal  eccentricity,  and  that  their  characteristic  si/cs 
should  scale  linearly  with  eccentricity,  being  smallest  in  the  fovea  and  doubling  in  si/e  by  about  f. 
Recently .  Marr.  Hildreth,  and  I’oggio  (1979)  have  noted  evidence  for  a  fifth,  smaller  channel  in  the 
fovea,  and  Stevens  ( 1980)  has  shown  that  the  fifth,  finest  resolution  channel  plays  the  most  important 
role  in  determining  die  information  we  compute  fovcally. 

W'c  can  compute  the  width  of  the  finest  resolution  channel  at  any  eccentricity  t.  If  we  digitise 
a  text  image,  say  at  a  resolution  of  100  microns,  we  can  compute  die  size  of  mask  to  use  in  a  com¬ 
puter  program  which  precisely  models  the  information  available  in  the  finest  resolution  channel  at 
eccentricity  c.  Examples  of  the  result  of  applying  this  process  can  be  found  in  figure  6. 

3.  The  isolation  of  words  in  text 
3.1  Introduction 

It  is  usual  to  equate  early  processing  in  reading  w  ith  the  extraction  of  character  features,  such  as  line 
endings,  I  -junctions,  holes,  and  concavities.  We  are  presently  more  concerned  with  an  even  earlier 
processing  stage,  namely  the  point  at  which  the  visual  system  first  makes  contact  w  ith  (the  gray  level 
intensities  forming  the  image  ol)  a  portion  of  text,  l  .ct  us  suppose  for  the  moment  dial  die  "reading 
process"  is  already  active.  T  he  work  of  Rayner  (1975a.  1975b.  1977.  1978a,  197Sb.  1979,  Raynor  and 
McConkic  1976,  Rayner,  McConkie,  and  Ehrlich.  1978,  McConkieand  Rayner,  1975)  and  others  (sec 
for  example  McConkic(1979),  0'Rcgan(1979).  1  evy-Schocn  and  0'Rcgan(1979))  on  eye  movements 
demonstrates  clearly  that  text  is  substantially  processed  before  it  is  foveated.  T  he  extent  to  which 
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eye  movement  control  is  either  <  I)  autonomous,  being  entirely  detei  mined  by  information  computed 
In  early  processing  from  the  gray  level  array:  or  (?)  is  capable  of  being  explicitly  controlled  by 
dow  nwaid  llow  ing  task  specific  information,  say.  by  knowledge  of  the  syntax  and  semantics  of  the  text 
in  question,  is  controversial.  This  is,  of  course,  the  invariance  of  the  issue  raised  in  section  2.1  about 
system  organization. 

I  hc  goal  of  reading  may  be  supposed  to  be  the  efficient  extraction  of  meaning  from  imaged  text. 
Given  the  nature  of  written  language,  particularly  I  'nglish,  a  presumably  necessary  primitive  subgoal 
is  the  isolation  of  words.  In  normal  text,  words  are  clearly  separated  by  spaces  which  arc  substantially 
wider  than  the  spaces  between  individual  letters.  It  would  seem  that  the  "program"  controlling  eye 
movements  could  be  trivial  given  a  reasonable  theory  of  the  separation  of  words  from  inter-word 
spaces  such  as  that  provided  by  the  Man  I  lildreth  theory  outlined  in  the  previous  section.  I'vidcncc 
in  support  of  the  contention  that  the  control  program  is  quite  simple  is  easy  to  find,  firstly,  it  is 
well  known  that  inter  word  spaces,  even  when  they  arc  of  varying  width,  arc  never  foveated  (Levy- 
Schocn.  1979).  Conversely,  if  spaces  corresponding  to  word  boundaries  are  randomly  introduced 
into  previously  elided  text  (  as  shown  in  figure  2  ).  reading  becomes  exceptionally  difficult.  In  this 
situation,  the  inconsistent  information  provided  by  a  simple  space  finding  algorithm  and  its  utilisation 
by  the  processes  which  analyze  the  text,  produce  a  complex  pattern  of  Inventions  and  a  significant 
increase  in  the  duration  of  any  individual  foveation.  Intermediate  behavior  results  when  inter-letter 
spaces  arc  made  nearly  equal  to  those  between  words. 

However,  as  is  equally  well  known,  spaces  arc  not  unique  in  avoiding  foveation.  In  particular, 
function  words  such  as  "and'  and  "the”  arc  rarely  foveated.  This  partly  explains  the  difficulty 
difficulty  we  have  in  proofreading  "Paris  in  the  the  spring"  relative  to  this  sentence  as  a  whole,  lhis 
raises  the  ever  present  question:  how  "intelligent"  docs  the  eye  movement  controller  need  to  be?  Is 
the  word  "ihc"  omitted  on  the  basis  of  information  available  in  the  parafovea,  where  indiv  idual  letter 
recognition  is  poor  (Houma.  1971).  or  alternatively  docs  it  rely  on  knowledge  about  the  linguistic 
context? 


3.2  Fisher’s  results  on  reading  transformed  text 

In  fact,  the  trivial  word  isolation  process  sketched  above  docs  not  work  in  every  circumstance  in 
which  people  can  read  quite  easily.  This  was  demonstrated  in  an  elegant  experiment  performed  by 
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Figure  2.  Text  into  which  spaces  have  been  randomly  introduced  after  elision 

Fisher,  1975.  building  upon  die  earlier  work  of  Smith.  1969  and  lloehbcrg,  1970.  Fisher  used  the 
transformed  texts  illustrated  in  figure  3  to  investigate  the  effect  of  manipulations  of  word  shape  and 
word  boundary  on  reading.  Word  shape  was  "manipulated"  via  three  type  variations',  normal,  all 
upper  case,  and  alternating  upper  and  lowercase  letters.  These  arc  illustrated  in  samples  one  to  three 
of  figure  3.  Word  boundaries  were  also  "manipulated"  in  three  ways:  normal  spacing,  replacing  an 
inter  word  space  by  the  filler  character  "  +  "  or  and  elision  to  remove  inter  word  spaces.  Ihcsc 
manipulations  arc  illustrated  for  the  uppercase  type  variation  in  samples  two.  five,  and  eight  of  figure 
3.  In  all,  there  arc  nine  possible  type  and  word  boundary  combinations,  and  they  are  shown  in  figure 
3. 

(F'isber  1975,  page  189)  recorded  the  length  of  time  taken  by  subjects  to  read  nine  paragraphs  of 
approximately  equal  length  and  complexity,  whose  texts  had  been  randomly  manipulated  in  the  ways 
described  above.  As  a  safeguard  against  skim  reading  without  understanding,  a  subject  was  required 
to  answer  a  number  of  questions  (typically  four)  about  the  passage  just  read,  and  was  required  to  get  a 
certain  number  con  eel  for  the  data  point  to  be  iccoidcd.  The  results  arc  presented  in  figurc4. 

F  isher  1975,  page  189  noted  that  the  "inlet dependence  of  cues  causes  a  reduction  in  tending 
speed  to  nearly  one  third  of  the  speed  of  the  separate  cue  manipulations”,  «ind  he  suggested  dial 
this  "interdependence  of  word  shape  and  word  boundary  cues  lends  lo  implicate  higher  order  visual 
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Figure  3.  The  nine  lypc  and  boundary  variations  used  by  Fisher 
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Figure  4.  Fisher's  results,  reproduced  from  Fisher  1975,  page  189 


processing  limn  might  be  requited  simply  for  word  identification"  lusher  1975,  page  190. 

3.3  The  role  of  early  visual  processing  in  the  isolation  of  words  in  text 

In  the  Introduction,  wc  commented  on  the  difficulty  of  devising  and  controlling  processes  which 
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embody  an  interaction  between  upward  flowing  and  downward  flowing  information,  and  argued  for 
a  model  where  early  visual  processing  plays  a  bigger  role.  Since  word  isolation  is  clearly  one  of  the 
first  steps  in  reading,  we  start  by  examining  Fisher's  results  more  closely,  in  the  hope  of  discovering 
.in  explanation  of  his  findings  without  resorting  to  higher  level  cues.  Firstly,  die  reading  time  per 
word  in  sample  seven  is  significantly  lower  than  that  in  sample  eight.  This  might  be  explained  on  die 
grounds  of  the  l  itter  s  lesser  shape  variability.  However,  sample  nine  has  gnv/n-r  variability  in  shape 
than  sample  eight,  and  yet  the  lime  to  read  eight  is  significantly  lower  titan  that  for  nine.  Similarly, 
there  is  greater  variability  in  the  shape  of  sample  three  titan  sample  two.  and  yet  the  time  to  re;id 
three  is  significantly  greater.  Clearly,  one  possible  explanation  is  that  in  the  absence  of  sp;ncs.  capital 
letters  can  be  used  to  signal  word  boundaries.  According  to  this  explanation,  samples  three  and  nine 
provide  information  (random  capitals)  about  word  boundaries  inconsistent  with  that  discovered  by 
the  processes  which  analyze  the  text.  (Compare  figure  2  and  its  discussion  in  the  text).  It  would 
then  follow  that  the  eve  guidance  system  could  make  the  distinction  between  upper  and  lower  case 
characters  and  makes  use  of  that  information  in  isolating  words. 

I  Ins  leads  to  our  first  empirical  prediction:  if  the  paragraphs  used  by  Fisher  arc  transformed 
In  first  capitalizing  the  initial  letter  of  each  word  and  then  eliding,  so  as  u»  appear  as  in  figure  5a. 
the  resulting  text  should  be  significantly  easier  to  read  than  the  elided  text  sample  shown  in  figure 
5h  (compare  simple  seven  in  figure  4).  This  prediction  forms  experiment  one.  The  experimental 
details  can  be  found  in  the  next  section.  For  the  purposes  of  this  section,  it  suffices  to  note  dial  the  ex¬ 
periments  were  designed  strictly  in  accordance  with  the  method  devised  by  Fished  1975)  to  maximise 
comparability  with  his  results.  Subjects  were  required  to  read  texts  which  had  been  transformed  in 
various  ways  similar  to  those  shown  in  figure  4.  Iltc  average  reading  time  per  transformed  word  was 
compared  for  significance  between  two  variations.  According  to  this  metric,  the  phrase  "significantly 
easier  to  read"  means  that  die  reading  time  per  word  was  significantly  shorter. 

It  turns  out  that  the  capitalized  elided  text  shown  in  figure  5a  is  indeed  significantly  easier  (p  < 
0  01)  to  read  than  the  elided  normal  text  shown  in  figure  5b.  litis  supports  the  hypothesis  that  we 
are  capable  of  distinguishing  between  upper  and  lower  ease  characters  on  the  basis  of  information 
available  in  the  parafovea.  Significantly,  however,  it  leaves  open  the  precise  details  of  the  way  in 
which  that  distinction  is  made. 
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ItK'owItf'c.iispF.vidontThntThuCityMustBc-AbaiiiloiicilAtOnccTlicrcViisAIlifforcncfcOf 
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I  toowbrrarcrevidentthattheci  tynur.  tbeabandonod.i  tone  (•Thor  cv.-asad  if  for  ciiecof 
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Figure  5.  Typical  dulu  fur  Fjiperimeni  one.  Figure  Sa:  text  which  has  lieen  elided  after  capitalizing 
the  initial  letter  of  each  word.  Figure  5b:  elided  normal  text  like  that  in  sample  seven  of  figure 
4. 


Some  evidence  beating  upon  litis  distinction  can  be  gleaned  from  Die  tcstilLs  for  samples  five, 
six.  eight,  and  nine  in  figine  4.  Whereas  sample  five  is  significantly  easier  to  lead  than  sample  eight, 
there  is  insignificant  difTcicnce  between  the  ease  of  reading  samples  six  and  nine,  litis  is  a  pu//lc.  The 
advantage  of  sample  five  over  sample  eight  suggests  that  wc  arc  capable  of  dynamically  modifying  our 
eye  movement  control  s,,stcm  to  exploit  die  delimiter  ”(?*  ",  and  this  contention  is  supported  by  the 
significant  advantage  of  sample  four  over  sample  seven.  However,  if  wc  arc  capable  of  distinguishing 
upper  ease  characters  and  die  character  "  in  the  parafovea  in  a  way  which  is  entirely  robust  and 
reliable,  wc  would  expect  to  find  a  similar  significant  advantage  for  sample  six  over  sample  nine; 
lull  wc  do  not.  One  possible  resolution  of  this  jni//lc  would  he  to  show  dial  it  is  often  dilficult  to 
distinguish  "(n"  and  upper  ease  characters  when  dicy  arc  viewed  in  the  parafovea.  If  that  were  so, 
the  use  of  ”<h  "  as  a  filler  would  give  some  advantage  in  sample  five  relative  to  sample  eight,  but  the 
advantage  would  be  offset  by  the  inconsistent  information  prov  i«Jcd  by  fillers  and  text  in  sample  six. 

To  investigate  this  question  precisely,  wc  need  a  detailed  representation  of  die  information 
which  is  actually  available  in  the  parafovea.  Foiiunalely.  such  a  representation  is  now  available, 
having  recently  been  developed  by  Marr  and  Hildreth  (1980),  and  it  was  sketched  in  the  previous 
section.  Figure  6  shows  the  result  of  applying  the  digitisation  process  described  in  that  section  to 
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-  get  x  - 


Figure  6.  The  result  of  convolving  sample  five  of  Fishers  data  to  show  the  information  available 
at  <f .  Figure  6h:  all  instances  of  the  character  Figure  6c:  instances  of  the  character 
which  arc  diliicult  to  distinguish  on  the  basis  of  shape. 


sample  live  of  l  ishcr  s  data  (figure  4)  at  ail  eccentricity  of  four  degrees  F  igure  6b  explicitly  maiks  the 
convolved  "(a  ”  characters  It  can  be'  seen  quite  clearly  that  while  some  of  them  arc  relatively  easy  to 
distinguish  on  the  basis  of  shape,  others  (lot  example  (Itose  marked  in  figure  6c)  arc  not. 

This  evidence  does  indeed  seem  to  show  that  it  is  often  difficult  to  distinguish  "(«  "  and  upper 
case  ehar.ielers  when  they  arc  viewed  in  the  parafovea.  Wc  suggest  that  this  resolves  the  puzzle 
of  Fisher's  results  discussed  above  without  the  need  to  postulate  any  downward  flow  of  high  level 
information.  It  further  suggests  that  while  upper  and  lower  case  characters  can  be  clearly  and  reliably 
distinguished  (in  most  fonts),  the  model  of  "uppercase  character"  used  by  the  early  visual  system 
in  guiding  eye  movements  is  actually  quite  crude.  Tentatively  wc  may  suppose  that  the  model  of  an 
upper  case  character  amounts  to  an  assertion  that  they  are  relatively  large  compared  to  those  in  lower 
ease  and  have  relatively  lower  curvature.  This  simple  model  normally  serves  the  reader  well,  since 
written  text  consists  mostly  of  upper  and  lower  case  characters.  However,  being  a  simple  model,  it  is 
easily  confused,  and  is  particularly  unreliable  at  making  the  distinction  between  upper  case  characters 
and  'W. 

A  number  of  predictions  follow  from  this  analysis.  Firstly,  it  suggests  dial  a  font  in  which  the 
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An  inquiry  which  mas  just  been 
meid  at  Brighton  once  more  il¬ 
lustrates  tne  Kina  or  leading 
strings  in  wincm  local  munici¬ 
palities  are  kept.  An  inspector 
or  tne  local  Government  Board 
mas  been  molding  a  Kina  or  nun 
-11c  n  lauest  on  ime  proposal  or 
me  bh  grnon  corporation  to  bor¬ 
row  as.ooot.  This  enterprising 
puniic  body  m  its  desire  to  in¬ 
crease  tne  attractions  or  the 
great  Sussex  watermg-Piace 
has  resoiveato  buy  an  estate 


Figure  7.  A  font  in  which  Ihe  distinction  between  upper  and  lower  case  would  lie  difficult  to 
make  ll  is  reproduced  from  Spence  if  1968.  page  16).  from  which  we  quote:  "A  new  kind  of  type 
proposed  in  die  1880's  by  Andrew  Tuer  in  which  'Ihe  tailed  letters  projecting  above  or  below  the 
line,  have  been  docked'  to  provide  maximum  type  si/e  ‘where  economy  of  space  is  an  object  - 
as  in  the  crowded  columns  of  a  newspaper 


distinction  between  upper  and  lower  case  is  difficult  to  make  on  the  basis  of  si/e  and  shape  would  be 
quite  hard  to  read.  Figure  7  shows  such  a  font.  Indeed,  as  we  point  out  in  the  Conclusion,  the  analysis 
here  can  he  viewed  as  a  first  step  towards  making  font  design  less  subjective  than  it  has  been  in  the 
past  (sec  for  example  Spenccij  1968)).  Secondly,  die  analysis  suggests  dial  on  the  basis  of  the  informa¬ 
tion  available  in  the  parafovea,  it  would  be  difficult  for  the  visual  system  lo  distinguish  the  capitalized 
elided  text  shown  in  figure  8a  and  Ihe  text  filled  with  shown  in  figure  8h.  This  translates  into  a 
prediction  that  there  should  be  insignificant  difference  in  die  case’,  that  is  to  say  speed  per  word,  of 
reading  the  samples  in  figure  8.  F.xpcrimcnt  2  confirms  this  prediction;  die  relative  advantage  of  one 
sample  over  the  other  failing  lo  reach  significance  at  die  10%  level. 


The  same  computational  argument  can  be  turned  around,  in  which  case  it  leads  to  die  prediction 
that  using  a  "visually  striking"  character  as  a  filler  would  produce  text  that  is  significantly  easier  to 
read  than  when  "<?{  "'  is  used.  Indeed,  insofar  as  this  can  be  shown  empirically,  it  essentially  enables  us 
w  frame  a  precise  definition  of  "visually  striking".  In  Experiment  3,  we  compare*  the  effect  of  using  "V* 
and  "([i "  as  fillers.  Ihe  choice  of  "V  was  quite  deliberate.  Figure  9  shows  a  sample  of  text  which  has 
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ItNowBecaneEvidentThatTheCityNustBeAbandonedAtOnceThereWasADiffcrenceOf 

OpinionlnRespectToTheHourOfDepartureThcDaytirneltWasArguedBySomeVfouldBe 

PrcfcrablcSinceltY/ouldEnableThenToSeeTheMatureAndExtentOfTheirDangerAndTo 


itPnovvPbecaincPevidentlHhat&thePcityPinustPbePabandonedPatPonccPTherePwasPa 
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iy>) 


Figure  8.  a.  Text  sample  in  which  words  have  been  elided  following  capitalizing  each  initial  letter, 
b  Text  in  which  spaces  have  been  filled  by  (compare  Figure  4.  sample  5) 


been  digitised  and  con  stilted  according  lo  die  Marr  Hildreth  dieory  at  a  number  of  eccentricities  in 
the  manner  sketched  earlier.  Figure  9b  shows  die  information  available  way  out  at  9°  (corresponding 
to  about  36  letter  spaces),  and  figure  9c  shows  die  instances  w  hich  ctcry  one  of  a  group  of  fis  c  subjects 
chose  when  they  were  instructed  to  simulate  an  unintelligent  program  to  extract  "V'  from  figure  9b. 
Figure  9d  illustrates  die  information  available  at  7°,  and  shows  dial  the  subjects  correctly  isolated  each 
and  every  instance  of  "\".  Finally,  figure  9c  shows  the  information  available  at  4°.  It  is  clear  that  the 
early  visual  system  could  more  easily  and  reliably  find  instances  of  ”\”  than  and  so  we  arc  led 
to  predict  that  (he  Fisher  like  sample  of  (ext  shown  in  figure  9a  would  he  significantly  easier  (o  read 
Ilian  (he  same  tiling  willi  ”\"  replaeed  hy  Fxpcrinicnt  3  confirms  this  prediction.  Indeed,  in 
Fxpcriment  4.  we  compared  (he  visually  striking  filler  "\"  and  normal  spacing  (sample  I  of  figure  4), 
and  we  find  dial  (lie  relative  advantage  of  normal  spacing  fails  to  reach  significance  even  at  the  10% 
level. 

The  final  experiment  5  is  a  tribute  to  die  versatility  of  die  computing  facilities  available  for  this 
research.  Consider  die  text  sample  in  figure  10a,  in  which  die  forward  slash  character  is  used  as  a 
delimiter.  Since  the  downstrokes  of  ascender  characters  such  as  "b"  and  "f”  sfopc  slightly  forwards 
but  not  nearly  so  much  as  the  slope  of"/",  we  would  expect  a  similar  significant  advantage  for  "/" 
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Figure  9.  a.  texi  sample  in  which  "V’  is  used  as  a  filler  between  words,  b.  resulting  of  convolving 
(he  sample  in  (a)  to  show  the  information  available  at  9P.  c.  Instances  of  "\"  found  in  (b)  by  a 
group  of  subjects  simulating  an  unintelligent  program,  d.  Information  available  in  the  convolved 
image  at  7°  eccentricity,  e.  Information  available  to  early  visual  processing  at  4°. 


over  "(a  It  turns  out  that  this  is  the  case.  More  interestingly,  wc  were  able  to  design  a  font  in  which 
the  only  change  compared  to  that  of  characters  in  figure  JOa  is  that  the  forward  slash  character  had 
precisely  the  same  slope  as  llic  downslroke  of an  ascender  (see  figure  10b).  Figures  10c  and  10d  show  the 
convolved  images  of  die  samples  in  figures  10a  and  10b  respectively.  'Ihc  analysis  developed  above 
leads  us  to  predict  that  (ext  samples  of  the  form  shown  in  figure  10a  nil)  be  significantly  easier  to  read 
(lian  (hose  in  the  special  font  shown  in  figure  I  fib,  (hough  we  might  expect  that  there  will  be  a  reduced 
advantage  compared  to  that  shown  by  ’7"  or  "\"  over  Experiment  5  confirms  this  prediction,  the 
significance  being  only  at  the  5%  level. 


4.  Experimental  details 

The  experiments  were  designed  strictly  in  accordance  w  ith  the  method  devised  by  Fisher  (1975) 
to  maximi/.c  comparability  with  his  results. 

Method.  Twelve  members  of  the  Artificial  Intelligence  laboratory  who  were  naive  with  regard 
to  die  purpose  of  die  experiment  took  part 
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The  second  paragraph  f  ron  the  Me  I son-Dcnny 
test  used  by  Fisher.  Spaces  between  words 
have  been  replaced  by  a  single  \ 
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Figure  10.  a.  text  simple  filled  wiih  b.  lexi  sample  in  the  special  font  in  which  the  forward 
slash  character  has  precisely  the  same  slope  as  the  ascender  of  "d”.  c.  convolved  image  of  (a)  at 
4°.  d.  convolved  image  of  (b)  at  4°. 


Materials.  The  nine  paragraphs  of  the  1960  revised  Nelson  Denny  Reading  l  est  (Denny  1960) 
were  used,  together  with  three  paragraphs  of  similar  length  (about  200  words)  and  complexity.  The 
Nelson  Denny  texts  were  used  by  Fisher  because  they  "had  a  very  high  degree  of  standardization 
from  high  school  through  college  aged  adults”  (Fisher  1975,  page  189).  A  l  imes  Roman  Ift  point  font 
was  used  throughout  the  experiments,  llicrc  were  several  variations  to  the  basic  font: 

(i)  regular  spacing  betw  een  words  ("normal"). 

(ii)  all  words  elided  together,  that  is,  intcr-word  spacing  rcmo\/cd. 

(hi)  words  elided  together  after  the  initial  letter  of  each  word  had  been  capitalised. 

(iv)  intcr-word  spaces  filled  by 

(v)  intcr-word  spaces  filled  by 

(vi)  inter  word  spaces  filled  by 

(vii)  intcr-word  spaces  filled  by  a  special  character  of  the  same  slope  as  the  descenders  in  die 

font  r 

’Hie  experiments  (1-5)  described  in  the  previous  section  were  designed  to  compare  the  relative  ease  of 
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;;  the  second  paragraph  of  the  He  I  son - 
;;  Denny  test  with  forward  slasher. 

;;  inplantcd  in  ever y  space.  This  is  the 
;;  regular  space  character  in  the  italic 
;;  font  which  is  tines  I2ital. 

;;  llotice  that  its  slope  is  pcrccpibly 
;;  greater  than  that  of  down  slop'  s  of 
;;  characters  such  as  l,h. 

;;  In  this  inage  it  is  convolved  at  wrp, 
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EXPF.RIMLNTAL  DETAILS 


reading  several  pairs  of  the  variations  listed  above.  Specifically,  the  following  hypotheses  were  tested: 

( 1 )  (ii)  vs  f  iii):  It  was  hypothesized  that  it  would  be  significantly  easier  to  read  variation  (iii)  than 
variation  (ii). 

(2)  (iii)  vs  (iv);  It  was  hypothesized  that  there  would  be  insignificant  difference  between  the  case 
of  reading  \  ariations  (iii)  and  (iv). 

(3)  (iv )  vs  (v):  It  was  hypothesized  that  it  would  he  significantly  easier  to  read  variation  (v)  than 
variation  (iv ).  A  similar  hypothesis  was  that  variation  (vi)  would  show  significant  advantage  over 
(iv). 

(4) (i)  vs  (v):  It  was  hypothesized  that  there  would  be  insignificant  difference  between  live  ease  of 
reading  variations  (i)  and  (v). 

(5)  (vi)  vs  (vii):  It  was  hypothesized  that  it  would  be  significantly  easier  to  read  variation  (vi) 
than  variation  (vii). 

The  variations  (i)  to  (vii)  were  divided  into  two  overlapping  sets  (i).  (ii).  (iii).  (iv).  (v)  and  (ii),  (iii). 
(iv).  (vi).  (vii).  'live  subjects  were  divided  into  two  groups  of  six  and  each  group  was  associated  with 
one  of  the  two  sets  of  variations.  Ivach  subject  had  an  individually  prepared  booklet  consisting  of  the 
twelve  paragraphs.  The  b<x>klets  comprised  two  instances  of  paragraphs  in  three  of  the  variations  and 
three  instances  of  two  of  the  variations.  live  choices  of  variations  and  the  order  of  presentation  of  the 
variations  was  counterbalanced  over  all  subjects.  "After  each  paragraph,  a  set  of  four  multiple  choice 
questions  was  presented  which  had  to  be  answered,  live  questions  were  taken  from  the  Nelson  Denny 
Reading  Test.  A  digital  cl«x:k  graduated  in  [steps  of  0.1  sccond|  provided  a  display  of  live  time  to  read 
and  was  clearly  visible  to  all  subjects’^ Fisher  1975,  page  189). 

Procedure.  Each  subject  was  given  a  page  of  instructions  containing  the  variations  of  text  which 
would  appear,  the  individually  prepared  btxiklct  of  twelve  paragraphs,  and  a  question  and  answer 
sheet.  "When  subjects  finished  reading,  they  were  to  look  at  the  time  . . .  they  were  then  to  turn  the 
page,  answer  the  questions,  and  wait  for  instructions  to  go  on  to  the  next  paragraph. "(Fisher  1975, 
page  189). 

Results.  As  there  was  a  substantial  spread  in  live  reading  speed  of  the  subjects,  averaging  the 
data  points  over  all  subjects  for  a  particular  text  produces  an  unacceptably  large  standard  deviation. 


CONCLUSION 


As  we  arc  in  fact  must  interested  in  the  relative  case  of  reading  two  variations,  the  relevant  hypothesis 
lor  comparing  one  text  variation  o  against  another/?  is  the  mill  hypothesis: 


- 1. 


\\  e  can  use  the  simple  t  statistic  defined  by 


t  — 


r-  1 


w  here  there  arc  u  4-  1  subjects,  r  is  the  mean  of  the  individual  values  of  .  where  t„  is  the  time  taken 
per  word  to  read  the  paragraphs  in  variation  a.  and  s  is  the  standard  deviation  of  that  measure  from  r. 
I  lie  actual  results  were  given  in  the  previous  section. 


5.  Conclusion 

This  paper  began  by  sketching  the  background  against  which  this  investigation  of  w  in  d  isolation 
in  the  parafovea  has  been  conducted.  Our  aim  has  been  to  show  how  published  empirical  data,  espe¬ 
cially  that  of  f  isher  (1975).  could  be  accounted  for  using  the  rich  theories  of  early  visual  processing 
of  the  natural  world  which  have  recently  been  developed  in  Anilicical  Intelligence.  On  the  basis  of 
a  precise  representation  of  the  information  available  in  the  parafovea.  we  proposed  an  explanation 
of  Fisher's  results  by  postulating  a  crude,  though  mostly  reliable,  model  of  upper  versus  lower  ease 
characters.  The  same  computational  evidence  led  us  to  frame  a  number  of  predictions,  each  of  w  hich 
was  then  confirmed  by  psychophysical  experimentation.  As  a  side  effect,  we  were  required  to  consider 
how  the  idea  of  a  character  being  "visually  striking"  might  be  made  precise,  litis  approach  provides  a 
method  for  the  study  of  legibility  to  add  to  those  listed  by  Spenccr(l%8,  page  21). 

As  we  pointed  out  in  the  Introduction,  this  study  is  the  merely  the  first  step  on  the  long 
haul  towards  understanding  through  computation  the  exquisite  human  skill  of  reading.  Hie  results 
reported  here  have  encouraged  us  to  proceed  to  consider  the  next  step  in  the  process  of  acquiring 
meaning  from  the  sea  of  gray  level  intensities  which  form  the  image.  We  consider  the  next  step  to 
be  the  problem  of  integrating  information  over  successive  saeemfes.  Rayncr's  (1975a,  1975b,  1977. 
1978a,  1978b,  1979,  Rayncr  and  McConkic  197b,  Rayncr,  McConkie,  and  Khrlich,  1978,  McConkic 
and  Rayncr,  1975)  work  provides  a  rich  background  of  empirical  data  for  our  study,  which  is  intended 
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to  exploit  detailed  computational  models  of  natural  vision  in  the  manner  of  this  paper.  It  is  dear  for 
example  that  the  notion  of  "word  shape”  needs  to  be  made  more  precise  by  defining  .in  appropriate 
representation  of  the  information  available  when  a  word  is  convolved  at  T.  Rayncr's  ( 1975.  page 
7b)  finding  that  the  first  and  last  letters  of  a  word  (his  NS  condition)  cause  a  significant  increase  in 
fox  cation  duration  is  entirely  consistent  with  die  approach  pursued  here.  When  two  nearby  lines  arc 
convolved,  they  produce  a  smeared  blob.  This  occurs  not  only  for  strokes  within  a  character,  but  for 
nearby  strokes  of  two  adjacent  characters  (see  figure  II).  Such  inter-character  smearing  confounds 
any  process  whose  goal  is  to  elicit  structure  within  a  word,  and  in  particular  to  discover  the  precise 
locations  of  its  indiv  idual  characters.  The  extremal  characters  are  relatively  unaffected  by  such  inter- 
character  smearing,  and  lienee  die  information  gleaned  at  4'  will  closely  match  (hat  computed  on  a 
subsequent  (foveal)  saccadc.  A  similar  argument  applies  to  ascenders  and  descenders,  so  long  as  they 
arc  relatively  isolated.  It  is  not  inconceivable  that  we  have  learned  that  such  shape  information  at  the 
extremities  of  words  and  from  isolated  ascenders  and  descenders  within  a  word  are  preserved  over  a 
typical  2°  saccadc,  and  have  based  our  word  representation  scheme,  which  develops  over  several  such 
saccadcs.  and  the  corresponding  pmccsscs  for  eliciting  substructure,  upon  it.  Further  study  is  needed 
to  make  the  representation  and  matching  process  precise. 

l-'or  the  moment  at  least,  we  are  left  with  a  reasonably  detailed  model  of  eye  movement  control 
whose  goal  is  the  isolation  of  words  in  text  on  the  basis  of  the  information  which  is  available  in  the 
parafovea. 

1.  We  can  reliably  isolate  spaces  above  a  size  which  is  yet  to  be  determined,  lmt  is  about  one 
character  space  in  normal  text.  We  assume  that  such  spaces  delimit  words,  and  mostly  this  inference 
serves  us  well.  We  arc  confused  (and  our  reading  is  inhibited)  when' they  do  not. 

If  a  space  is  located  on  either  side  of  a  blob  which  subtends  a  visual  angle  of  roughly  the  same 
size  as  an  indiv  idual  saccadc,  we  initiate  an  eye  movement  to  the  beginning  of  the  as  yet  unprocessed 
blob.  OKcgaii’s  (1979)  data  gives  us  some  evidence  on  which  to  develop  the  details  of  this  process. 
In  particular,  die  control  may  involve  a  cmdc  representation  of  die  soil  discussed  earlier  for  upper 
case  characters,  in  which  ease  it  would  presumably  be  easy  to  confuse.  Again,  this  requires  detailed 
investigation. 

2.  If  spaces  arc  not  available,  but  words  are  delimited  by  some  filler  character,  we  dynamically 
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l  iL’tire  II  Hu-  smearing  of  nearby  lines  hv  mmohuion  is  illusii.iiod  for  srmi.cs  vsithm  ;i  ih.uacicr 
(m.iikoJ  'a")  and  hciween  two  (li.n.mcis  ("b") 

adjust  our  scanning  strategy  to  locate  instances  of  that  filler  Hus  icqiuies  that  we  lust  v ompitu 
description  of  the  appearance  of  the  filler  in  ihe  pa  i  a  for  ea.  and  secondly  dial  »e  search  for  instances 
of  the  description  in  the  convolved  parafoveal  image.  Ilns  strategy  is  reliable  to  die  extent  that  die 
filler  is  "visually  striking",  dial  is  to  say,  its  instances  can  reliably  be  extracted  from  the  available 
information.  Hie  backwards  and  forwards  slash  characters  arc  visually  striking  in  this  sense,  die 
"(ft  "sign  is  less  so.  It  is  to  be  expected  dial  the  first  fovcalion  of  text  in  winch  spaces  aie  routinely 
filled  in  this  way  would  be  considerably  longer  than  subsequent  ones  (there  is  son  re  evidence  dial  dus 
is  generally  true,  see  I  cvy  -Sclroc»  1979,  page  12).  It  may  be  conjee  lured  that  this  can  he  explained  on 
the  basis  of  the  considerations  discussed  in  this  paper. 

In  particular,  our  model  leads  to  the  following  jut-diction  Consider  a  text  sample  which  consists 
of  a  sequence  of  "segments",  each  of  which  can  be  several  winds  long  and  is  associated  with  a  par¬ 
ticular  filler  chat aclcr.  For  example,  a  segment  filled  with  "\"  might  be  followed  by  a  segment  filled 
with  "/"  and  so  on.  We  would  expect  that  there  would  be  a  significant  increase  m  the  duration  of 
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Ihx  cations  .il  the  boundary  between  two  segments  .is  llu'  parafoveal  processing  Ciils  lo  discover  an 
instance  of  Us  currently  "loaded"  filler,  and  has  lo  locale  anti  load  the  description  of  die  filler  for  (he 
next  segment. 

V  We  distinguish  between  upper  and  lower  case  charac  ters  on  Ihe  basis  of  si/e  and  lower  curva¬ 
ture  only.  Capital  letters  mark  important  linguistic  events  in  I  nglish.  such  as  proper  names  and  the 
beginnings  of  sentences.  As  before,  wc  assume  that  diis  importance  has  been  translated  into  a  coarse 
description  which  often  can  be  reliably  computed  in  the  parafovea.  While  it  often  serves  us  well  in 
isolating  upper  case  characters  and  drawing  our  attention  to  the  corresponding  linguistic  event,  it  is  a 
coarse  description  and  is  easily  confused. 

Othci  work,  not  reported  in  detail  here,  shows  a  slight  though  not  statistically  significant  ad¬ 
vantage  over  sample  seven  in  figure  4  for  a  word  sequence  in  which  words  arc  alternately  printed  in 
a  nun. m  font  and  in  italics,  fins  elfect  is  less  than  that  which  occurs  when  bold  lout  is  aliernaicd 
wuli  legnlai  roman  Ibis  is  consistent  with  the  findings  of  legibility  research  Yanous  researchers, 
im  hiding  I  inker!  1955).  have  Ibund  that  italics  actually  retard  reading,  and  tli.n  leaders  mostly  do  not 
like  italics  I  inker!  1955)  Ibund  ilia!  %%  of  Ills  adult  subjects  were  of  the  opinion  dial  they  could  read 
low ci  case  Ionian  more  easily  (ban  italics. 

this  study  assumes  that  the  word  isolation  process  is  already  activated  at  the  time  when  the 
im  is  initially  encountered,  and  it  might  be  thought  that  high  level  knowledge  would  he  required  to 
elicit  this  .h.  ovation,  figure  1 2c  shows  a  sample  of  text  (figure  12a)  convolved  w  itli  a  mask  sue  which 
i  ui responds  to  fov  cation  at  a  distance  of  5.8.1  metres.  The  regular  texture  ot  lines  ol  blobs  is  quite 
de  n  even  though  it  is  impossible  lo  make  any  sense  of  the  text  In  short  the  image  A«<X  v  like  u  xi 
cun  iii  ii  i/iv/iukt.  as  docs  the  image  in  figure  I2g.  although  in  this  ease  it  is  in  fact  the  convolution 
ol  Ihe  linage  shown  in  figure  1 2d.  Once  again,  the  theory  King  advanced  here  is  dial  we  interpret  a 
p.niicular  image  as  a  piece  of  text  on  the  basis  of  quite  a  etude  representation,  which,  howexet.  mostly 
serves  ns  well. 

We  conclude  with  one  final  remark  on  the  notion  that  die  case  with  wlmli  a  text  can  be  lead 
is  directly  i elated  to  the  case  with  which  information  can  be  reliably  computed  from  its  convolved 
image  and  it  concerns  font  design.  A  great  deal  of  research  on  font  design  <see  for  example  Spencer 
l%8)  is  depressmgly  subjective.  Recently  however.  Jules/t  1980)  and  hts  colleagues  have  begun  a 
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I  igurc  1?  a  A  sample  of  text  displaced.  allot  plioioscanmng  .il  a  resolution  o(  IvM)  microns, 
using  .1  pseudo  pi  tv  level  system  devised  and  o  uisiim  led  by  Hcrlhold  Mom  lv  I  he  icsnlt  of 
i  oiHolvine  llie  text  in  figure  t?a  with  a  mask  whose  cenlial  panel  vvidih  is  Vi  I  his  totiesponds 
In  tovcalinp  die  text  at  a  distance  ol  5  St  metres  c  /em  crossings  of  die  convolution  shown  in 
figure  1  ?l>  I  lie  pattern  of  blubs  corresponding  to  winds  is  evtdem  d  A  set  ol  landom  marks 
ptodtned  b>  filing  m  the  regions  which  aose  flout  Having  iiiuml  die  text  sample  given  in  figure 
12a  e  A  niimlxr  of  cross  sections  of  the  intensity  profile  shown  in  figure  I2d  in  ihe  x  and  y 
direelMins  f  die  resull  of  convolving  the  image  shown  in  figure  l?e  in  die  same  way  as  figure 
12b  g  lhe  icio  crossings  of  the  convolution  in  figure  121  The  result  is  quue  similar  lo  figure 
12c 


study  winch  is  analogous  pi  dial  pursued  hcic  I  hey  apply  then  ideas  ahout  lexiuie  disv itnim.il i« m  iu 
define  .i  sol  ol  so  called  "lesions"  and  then  advocate  tin  design  of  folds  based  on  lhe  disc MU'.iuahtiily 
of  lex  tons  Our  approach  also  relates  the  legtl'ildy  of  .i  font  to  the  processes  of  natural  pereeplion, 
hut  we  .tic  curicntly  1111111’  concemed  with  nude ’islanding  the  penepliial  basis  of  the  clft.  aty  of  using 
senfs  and  so  forth  than  with  the  aesthetics  ol  font  design  There  is  nevertheless  a  good  deal  of 
similarity  between  onr  goals.  Much  mote  work  is  necessary  to  develop  the  ideas  sketched  ill  (his 

r 

section  into  a  coherent  and  precise  tlteory. 
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